Commentary on "The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators"

نویسنده

  • Peter Grünwald
چکیده

In the field of prediction with expert advice, a standard goal is to sequentially predict data as well as the best expert in some reference set of ‘expert predictors’. Universal data compression, a subfield of information theory, can be thought of as a special case. Here, the set of expert predictors is a statistical model, i.e. a family of probability distributions, and the predictions are scored using the logarithmic loss function, which, via the Kraft inequality, gives the procedure an interpretation in terms of data compression. A prediction strategy is a function that, for each n, given data xn ≡ x1, . . . , xn, outputs a “predictive” probability distribution p(· | xn) for Xi+1. For a given modelM, the Shtarkov or Normalized Maximum Likelihood (NML) strategy relative to M, is the prediction strategy that achieves the minimax optimal individual-sequence regret relative to M. NML has a number of drawbacks, detailed below, and is therefore often approximated by more convenient strategies such as Sequential Normalized Maximum Likelihood (SNML) or the Bayesian strategy. The latter predicts using the Bayesian predictive distribution for the model M, defined relative to some prior π, which is often taken to be Jeffreys’ prior — in that case we abbreviate it to J.B. The text below has been written so as to be (hopefully) understandable for readers who do not know too many details of these concepts; for such details, see e.g. Grünwald (2007) and/or Kotlowski and Grünwald (2011) (KG from now on).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

Kernel density estimators are the basic tools for density estimation in non-parametric statistics.  The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in  which  the  bandwidth  is varied depending on the location of the sample points. In this paper‎, we  initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...

متن کامل

Asymptotic Efficiencies of the MLE Based on Bivariate Record Values from Bivariate Normal Distribution

Abstract. Maximum likelihood (ML) estimation based on bivariate record data is considered as the general inference problem. Assume that the process of observing k records is repeated m times, independently. The asymptotic properties including consistency and asymptotic normality of the Maximum Likelihood (ML) estimates of parameters of the underlying distribution is then established, when m is ...

متن کامل

On the Minimax Optimality of Block Thresholded Wavelets Estimators for ?-Mixing Process

We propose a wavelet based regression function estimator for the estimation of the regression function for a sequence of ?-missing random variables with a common one-dimensional probability density function. Some asymptotic properties of the proposed estimator based on block thresholding are investigated. It is found that the estimators achieve optimal minimax convergence rates over large class...

متن کامل

Estimation of Parameters for an Extended Generalized Half Logistic Distribution Based on Complete and Censored Data

This paper considers an Extended Generalized Half Logistic distribution. We derive some properties of this distribution and then we discuss estimation of the distribution parameters by the methods of moments, maximum likelihood and the new method of minimum spacing distance estimator based on complete data. Also, maximum likelihood equations for estimating the parameters based on Type-I and Typ...

متن کامل

Positive-Shrinkage and Pretest Estimation in Multiple Regression: A Monte Carlo Study with Applications

Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is a priori known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012